Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks
نویسندگان
چکیده
Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or max-margin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O( 1 ) EG updates are required to reach a given accuracy in the dual; in contrast, for log-linear models only O(log( 1 )) updates are required. For both the max-margin and log-linear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing. We perform extensive evaluation of the algorithms, comparing them to to L-BFGS and stochastic gradient descent for log-linear models, and to SVM-Struct for max-margin models. The algorithms are applied to multiclass problems as well as a more complex large-scale parsing task. In all these settings, the EG algorithms presented here outperform the other methods.
منابع مشابه
Exponentiated Gradient Algorithms for Large-margin Structured Classification
Abstract We consider the problem of structured classification, where the task is to predict a label y from an input x, and y has meaningful internal structure. Our framework includes supervised training of Markov random fields and weighted context-free grammars as special cases. We describe an algorithm that solves the large-margin optimization problem defined in [12], using an exponential-fami...
متن کاملChunking with Max-Margin Markov Networks
In this paper, we apply Max-Margin Markov Networks (M3Ns) to English base phrases chunking, which is a large margin approach combining both the advantages of graphical models(such as Conditional Random Fields, CRFs) and kernel-based approaches (such as Support Vector Machines, SVMs) to solve the problems of multi-label multi-class supervised classification. To show the efficiency of M3Ns, we co...
متن کاملChunking with Max-Margin Markov Networks
In this paper, we apply Max-Margin Markov Networks (M3Ns) to English base phrases chunking, which is a large margin approach combining both the advantages of graphical models(such as Conditional Random Fields, CRFs) and kernel-based approaches (such as Support Vector Machines, SVMs) to solve the problems of multi-label multi-class supervised classification. To show the efficiency of M3Ns, we co...
متن کاملLarge margin methods for structured classification: Exponentiated gradient algorithms and PAC-Bayesian generalization bounds
We consider the problem of structured classification, where the task is to predict a label y from an input x, and y has meaningful internal structure. Our framework includes supervised training of both Markov random fields and weighted context-free grammars as special cases. We describe an algorithm that solves the large-margin optimization problem defined in [12], using an exponentialfamily (G...
متن کاملConditional Random Fields
In this report, we investigate Conditional Random Fields (CRFs), a family of conditionally trained undirected graphical models. We give an overview of linear CRFs that correspond to chain-shaped models and show how the marginals, partition function and MAP-labelings can be computed. Then, we discuss various approaches for training such models ranging from the traditional method of maximizing th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 9 شماره
صفحات -
تاریخ انتشار 2008